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METHOD AND APPARATUS FOR SPEECH RECOGNITION 
USING LATENT SEMANTIC ADAPTATION 



5 BACKGROUND OF THE INVENTION 

Field of the Invention 

The present invention relates generally to pattern recognition. More particularly, 
this invention relates to speech recognition systems using latent semantic analysis. 

Copyright Notice/Permission 

10 A portion of the disclosure of this patent document contains material that is 

subject to copyright protection. The copyright owner has no objection to the facsimile 
reproduction by anyone of the patent document or the patent disclosure as it appears in 
the Patent and Trademark Office patent file or records, but otherwise reserves all 
copyright rights whatsoever. The following notice applies to the software and data as 

1 5 described below and in the drawings hereto: Copyright © 2001 , Apple Computer, Inc., 
All Rights Reserved, 

Background 

As computer systems have evolved, the desire to use such systems for pattem 
recognition has grown. Typically, the goal of pattem recognition systems is to quickly 

20 provide accurate recognition of input pattems. One type of pattem recognition system is 
a voice recognition system, which attempts to accurately identify a user's speech. 
Another type of pattem recognition is a handwriting recognition system. A speech 
recognizer discriminates among acoustically similar segments of speech to recognize 
words, while a handwriting recognizer discriminates among strokes of a pen to recognize 

25 words. 

An important advancement in speech recognition technology is the use of 
semantic pattem recognition known as semantic language modeling. Semantic language 
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modeling uses the context of the spoken words to decide which words are most Ukely to 
appear next, the context referring to the domain or subject matter of the words as well as 
the style. For example, a speech recognition application using semantic language 
modeling will favor the word sequence "recognize speech" over "wreck a nice beach" 
5 when the subject matter is speech processing, and vice versa when the subject matter has 
to do with vacations at the beach. 

In semantic language modeling, the domain and style of the spoken words is 
captured using latent semantic analysis (LS A). LSA is a modification of a paradigm that 
was first formulated in the context of information retrieval and reveals meaningful 

10 associations in language based on semantic patterns previously observed in a corpus of 
language representative of a particular domain and style, for example, a training corpus 
having to do with speech processing vs. vacations at the beach. The semantic patterns are 
word-document co-occurrences that appear in the training corpus, where the corpus is 
comprised of a collection of one or more documents that contain paragraphs and 

1 5 sentences or other collections of words representative of the domain and style. 

The semantic knowledge represented by the semantic patterns is encapsulated in a 
continuous vector space, referred to as the LSA space, by mapping those word-docxmient 
co-occurrences into corresponding word and document vectors that characterize the 
position of the words and documents in the LSA space. During speech recognition, any 

20 new words or documents are first mapped onto a point in the LSA space, and then 
compared to the existing word and document vectors in the space using a similarity 
measure, a process referred to as semantic inference. Those new words and documents 
that map most closely to the existing word and document vectors in the LSA space are 
recognized over those that do not. 

25 A limitation in current implementations of speech recognition applications using 

semantic language modeling is that the LSA space is a fixed semantic space. This means 
that semantic patterns not observed in the training corpus cannot be captured and later 
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exploited during speech recognition. As a result, changes in the domain of the speech, or 
even just changes in the style of the speech, may not be properly recognized. In the case 
of financial news, for example, this means that an LSA-based speech recognition 
application trained on a collection of documents, say, from the Wall Street Joumal, will 

5 not perform optimally on new documents from the Associated Press, and vice versa. The 
use of a fixed semantic space is particularly deleterious in applications with many 
heterogeneous domains, such as an information retrieval system, since no database is big 
enough to contain a training corpus representative of all domains. It is also less than ideal 
for horizontal (i.e. non-specialized) dictation applications, because the same user 

10 typically adopts different styles in different contexts, for example the formal style of a 
business letter vs. the informal style of a personal letter. 

Distributed training seeks to overcome some of the Hmitations of a fixed semantic 
space by creating a distinct semantic space for each usage condition. Thus, using the 
financial news example, there would be one LSA space for the Wall Street Joumal, and 

1 5 another LSA space for the Associated Press. However, it is often impossible to predict 
ahead of time which kind(s) of text the end user will want to process, and even when that 
can be done, for most narrowly defined contexts and styles it may be challenging to 
gather enough data to reliably train the speech recognition system. 

Explicit modeling also seeks to overcome some of the limitations of a fixed 

20 semantic space by including a task (i.e. domain) and/or style component into the LSA 

paradigm. For example, it has been suggested to define a stochastic matrix to account for 
the way style modifies the frequency of words (C.H. Papadimitriou, P. Raghavan, H, 
Tamaki, and S. Vempala, ''Latent Semantic Indexing: A Probabilistic Analysis,'' in Proc. 
17^^ ACMSymp. Princip, Database Syst., Seattle, WA, 1998). However, this approach 

25 makes the assumption -- largely invalid -- that the influence of style on word frequency is 
independent of the underlying domain. 
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Another approach to the problem of a fixed semantic space is to re-compute the 
LSA space to accoimt for the new words and documents as they become available. One 
way is simply to re-compute the LSA space from scratch, referred to as full re- 
computation. Another way is to re-compute the LSA space from scratch, but keeping the 
5 dimension of the LSA space constant, referred to as constant dimension re-computation. 
But fall or constant dimension re-computation requires significant additional processing. 
The additional processing is undesirable since it consumes additional central processor 
unit (CPU) cycles and degrades responsiveness. 

Yet another approach to the problem of a fixed semantic space is to adapt the LSA 

1 0 space to account for the new documents and new words in the new documents as they 
become available by using traditional "folding-in" to incorporate new variants in the 
existing LSA space, referred to as baseline adaptation. While less computationally 
intensive, baseline adaptation results in speech misclassification error rates of 
unacceptably high levels. What is needed, therefore, is an improved method and 

1 5 apparatus for using semantic language modeling in a speech recognition system to more 
accurately recognize speech. 
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SUMMARY OF THE INVENTION 

A method and apparatus for speech recognition using latent semantic adaptation is 
described herein. According to one aspect of the present invention, a method for 
recognizing speech comprises generating a latent semantic analysis (LSA) space for a 
5 collection of documents and the words appearing in those documents, and to continually 
adapt the LSA space with new documents as they become available. Adaptation of the 
LSA space is optimally two-sided, taking into account the new documents, as well as the 
new words that appear in those new documents. Alternatively, adaptation is one-sided, 
taking into account the new documents, but discarding any new words appearing in those 
1 0 new documents. 

According to one aspect of the present invention, a computer-readable medium 
has executable instructions to cause a computer to perform a method to generate a speech 
recognition database comprising generating an LSA space for a collection of documents 
and the words appearing in those documents, and continually adapting the LSA space 
1 5 with new documents or both new documents and new words, as they become available. 

According to one aspect of the present invention, an apparatus for recognizing 
speech includes an adapted LSA space generator. The adapted LSA space generator 
generates an LSA space from a collection of documents and the words appearing in those 
documents, and continually adapts the LSA space with new documents or both new 
20 documents and new words, as they become available. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

The present invention is illustrated by way of example and not limitation in the 
figures of the accompanying drawings, in which like references indicate similar elements 
and in which: 

5 FIG. 1 is a block diagram that illustrates the use of latent semantic adaptation in 

the context of a speech recognition system using semantic inference, in accordance with 
one embodiment of the present invention; 

FIG.2 is a block diagram overview of some of the components of latent semantic 
analysis (LSA), in accordance with one embodiment of the present invention; 
10 FIG. 3 is an overview of selected components of the basic LSA paradigm, in 

accordance with one embodiment of the present invention; 

FIG. 4 is an overview of selected components of the adaptive LSA paradigm, in 
accordance with one embodiment of the present invention; 

FIG. 5 is an overview of selected components of the matrix transformation of the 
15 adaptive LSA paradigm, in accordance with one embodiment of the present invention; 

FIG. 6 is an overview of selected components of the vector transformation of the 
adaptive LSA paradigm, in accordance with one embodiment of the present invention; 
FIG. 7 is an overview of selected components of prior art baseline adaptation; 
FIG. 8 is a flowchart illustrating the process followed in generating and adapting 
20 an LSA space, in accordance with one embodiment of the present invention; 

FIG. 9 illustrates one embodiment of a computing device suitable for use with one 
embodiment the present invention; and 

FIG. 1 0 illustrates a graph comparing the misclassification error rates of an 
embodiment of the present invention as compared to other methods of re-computation 
25 and adaptation of the LSA space. 
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DETAILED DESCRIPTION 

In the following description^ various aspects of the present invention will be 
described. However, it will be understood by those skilled in the art that the present 
invention may be practiced with only some or all aspects of the present invention. For 
5 purposes of explanation, specific numbers, materials and configurations are set forth in 
order to provide a thorough understanding of the present invention. However, it will also 
be apparent to those skilled in the art that the present invention may be practiced without 
these specific details. 

Parts of the description will be presented in terms of operations performed by a 
10 computer system, using terms such as data, flags, bits, values, characters, strings, 

numbers and the like, consistent with the manner commonly employed by those skilled in 
the art to convey the substance of their work to others skilled in the art. As is well 
^ understood by those skilled in the art, these quantities take the form of electrical, 

magnetic, or optical signals capable of being stored, transferred, combined, and otherwise 
yl5 manipxdated through mechanical and electrical components of the computer system; and 

the term computer system includes general purpose as well as special purpose data 
SQ processing machines, systems, and the like, that are standalone, adjvmct or embedded, 
gi Additionally, various operations will be described as multiple discrete steps in 

2 turn in a manner that is helpflil in understanding the present invention. However, the 
20 order of description should not be construed as to imply that these operations are 
necessarily order dependent, in particular, the order of their presentations. 

The present invention provides a method and apparatus for speech recognition 
using semantic language modeling. Specifically, the method and apparatus use latent 
semantic adaptation to reduce the misclassification error rate in speech recognition 
25 applications without sacrificing computational efficiency. 

Latent semantic adaptation is a process of using latent semantic analysis (LSA) to 
capture the semantic patterns appearing in a training corpus of language by mapping them 
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into a continuous vector space, referred to as an LSA space, and continually adapting the 
LSA space with new semantic patterns as they appear over time. The training corpus is a 
collection of documents, where documents are instances of sentences, phrases, or other 
word groupings representative of a particular domain and style of language composition. 
5 Because semantic patterns that are not present in the training corpus cannot be 

captured, current implementations of semantic language modeling using LSA exhibit a 
relatively high sensitivity to changes in both the domain and style of language 
composition when using a particular speech recognition application. For example, as 
noted previously in the financial news example, an LSA-based speech recognition 

10 application trained on a collection of documents, say, from the Wall Street Journal, will 
not perform optimally on new documents from the Associated Press, and vice versa. 

In accordance with the method and apparatus of the present invention, however, 
the incremental adaptation of the LSA space improves performance by continually 
modifying the LSA space on the basis of new words and documents as they become 

1 5 available, where the new words and documents represent changes in either the domain or 
style of composition. As a result, any change in domain and/or style gradually gets 
reflected in the evolution of the LSA space, so that even new documents that do not 
closely conform to the training corpus (e.g. documents that contain several new words) 
can still be successfully processed. 

20 Among other advantages, latent semantic adaptation implemented in accordance 

with the method of the present invention accommodates new documents of virtually any 
size and number, is capable of taking advantage of out-of-vocabulary words present in the 
new documents, and is computationally efficient since it does not require re-computation 
of the LSA space or multiple matrix inversions. The latent semantic adaptation of the 

25 present invention is applicable to all applications of LSA, including semantic inference, 
dictation, information retrieval, and word and document clustering. 
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FIG. 1 is a block diagram that illustrates the use of latent semantic adaptation in 
accordance with one embodiment of the present invention, in the context of an LSA 
application using semantic inference in a speech recognition system 100. A speech 
recognition unit 104 receives an audio input 102 and, using acoustic models 106 and a 
5 language model 108, generates a sequence of words and documents 110. The audio 

input 102 is audio data that is input to the speech recognition system 100 and is intended 
to represent any type of audio data. Typically, the audio input 102 is a digitized 
representation of a human voice. 

According to one embodiment of the present invention, the acoustic models 106 

10 are hidden Markov models. Alternate embodiments can use different types of acoustic 
models, and any of a variety of conventional acoustic models other than hidden Markov 
models can be used. According to one embodiment of the present invention, the language 
model 108 is a context-free grammar, such as a finite state grammar, that is a compact 
way of representing an exhaustive list of each and every word that the speech recognition 

15 system 100 can recognize. Alternate embodiments of system 100 can use different types 
of language models, including a conventional n-gram statistical language model (such as 
a bigram model, where n = 2), where the probability of every word depends only on the 
n-l previous words. Hidden Markov models, w-gram language models, and context- 
free grammars are well-knovm to those skilled in the art and thus will not be discussed 

20 frirther except as they pertain to the present invention. 

An adapted LSA space derivation unit 111 receives the sequence of words and 
documents 110 and, using an LSA space 114, derives corresponding word and document 
vectors to adapt the LSA space 1 14 to reflect any changes in domain and/or style of the 
underlying language composition. The LSA space 1 14 is a continuous vector space 

25 previously constructed from word and document vectors, referred to as semantic anchors 
113, computed from a correlation matrix of word-document co-occurrences in a training 
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corpus of words and documents representative of a particular domain and style of 
language composition. 

A semantic classification unit 112 receives the sequence of words and documents 
1 10 and uses semantic inference to classify the sequence of words and documents 1 10 by 
5 determining the correlation between the sequence of words and documents 110 and one 
or more semantic anchors 113 present in the adapted LSA space 114. The correlation is 
the similarity between a vector corresponding to the sequence of words and documents 
1 10 and the vectors corresponding to the semantic anchors 1 14 as determined by using a 
similarity measure. The semantic classification unit 112 classifies the sequence of words 
10 and documents 1 10 as corresponding to the semantic anchor 113 with the closest 

correlation. The semantic classification unit 112 sends a semantic representation 1 16 of 
^'5 the classified sequence of words and documents 1 10 to an application unit 118. The 
S application unit 118 receives the semantic representation 116 and generates an 
J: application output 120. 

^ 1 5 The application unit 118 uses the semantic representation 1 16 to determine the 

^= application output 120 that is generated in response to the audio input 102. It should be 

v3 noted that the speech recognition system 100 as described herein is used for illustrative 
03 purposes only, and there may be any number of other applications other than semantic 

M 

y, inference for using the latent semantic adaptation of the present invention, including 
20 dictation, information retrieval, and word and document clustering applications. Such 
applications are well known to those skilled in the art, and thus will not be discussed 
fiirther except as they pertain to the present invention. 

FIG. 2 illustrates an overview of some of the components underlying LSA. As 
explained in more detail in FIG. 3, the basic LSA paradigm defines a mapping between a 
25 training text corpus 7 202, i.e., a collection of documents of interest, , ^2 ^--^at 204, 
the underlying vocabulary 206, |^| = M , i.e., the set of all the words , W2 .-^w^ 208 

appearing in the documents in 7, and a continuous vector space S 210, whereby each 
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word in ^ 206 and each document in 7 202 is represented by a vector cIu.Jn, w\,„wm 
212/214 in S' 210. If 7 202 is representative of general English, for example, then^ 206 
would be the M most frequent words in the language. Typically, M is on the order of a 
few thousand, and, depending on the application, N varies between a few hundred and 
5 several million documents; 7 202 might comprise up to a billion words of text. 

The continuous vector space 5 210 is semantic in nature, because the "closeness" 
of vectors in the space S 210 is determined by the overall pattern of the language used in 
the training corpus 7202, as opposed to specific constructs. Hence, two words whose 
representations are "close" (in some suitable metric) tend to appear in the same kind of 

10 documents, whether or not they actually occur within identical word contexts in those 
documents. Conversely, two documents whose representations are "close" tend to 
convey the same semantic meaning, whether or not they contain the same word 
constructs. More generally, word and document vectors 212/214 associated with words 
and documents 204/208 that are semantically linked are also "close " in the space 5210. 

15 On the other hand, a semantic pattern not present in the training corpus 7 202 cannot be 
inferred from the space 5210, hence the need to adapt the space to keep semantic 
knowledge as current as possible. 

FIG. 3 illustrates selected components of the basic LSA paradigm 300 used to 
construct the continuous vector space S , referenced in FIG. 3 as LSA space 5316. The 

20 LSA paradigm 300 first captures the semantic pattems of the word-document co- 
occurrences that appeared in the training corpus 7202 by constructing a word-document 
matrix 302 of dimension MxN , whose entries w^^ 304 suitably reflect the extent to 

which word 208 appeared in document dj 204, and then performing a singular value 

decomposition (SVD) of the word-document matrix W 302 having an order of 
25 decomposition of R « mm{M, as in [1] : 

W^USV\ (1) 
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where i7 306 is the M x left singular matrix of row vectors, (l < / < m), S 308 is 

the RxR diagonal matrix of singular values s^>S2> ^^^Sj^ > 0 , and is the 
transposition of F 3 10, the RxN right singular matrix of row vectors (l < j <N). 

The value of R can vary depending on the values of M and N , and by balancing 
computational speed (associated with lower values of J? ) against accuracy (associated 
with higher values of i? ). Typical values for R range from 5 to 100, 

As is well-known to those skilled in the art, both left and right singular matrices 
U 306 and F 3 10 are column-orthonormal, i.e., U^U = V'^V = (the identity matrix of 

order 7? ). Thus, the column vectors of matrices C/ 306 and 7 310 each define an 
orthonormal basis for the space of dimension R spanned by the u- 's and 's. This is the 

LSA space 5 3 1 6, in which the scaled row vectors Uj = u-S 318 and v j = VjS 320 (i.e. 

the rows of US 312 and VS 314) characterize the position of word w^and document . 

For this reason, 3 1 8 and 320 are referred to as a word vector and a document 

vector, respectively. 

Given the SVD from calculation (1), a particular document dj 204 in W 202 can 
be determined based on the 7th right singular vector according to the following 

calculation: 

dj=USv'^ (2) 

Further, based on calculation (2) and using well-known mathematical ftmctions and 
properties, the value of the 7th right singular vector Vj can be calculated according to the 

following: 

Vj = d'jUS-' (3) 

It is to be appreciated that the value US'^ does not change for different values of 7, 
and therefore the value US~^ can be pre-calculated and used during the classification of 
new words and documents 110 referenced in FIG. 1 . This pre-calculation reduces the 
computation required to perform the ftmctions of the semantic classification unit 112, 
thereby increasing the speed of a speech recognition system 100 during operation. 
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However, the LSA space 5 316 generated by the basic LSA paradigm 300 is a 
fixed semantic space and must eventually be re-trained to keep pace with additions and 
changes not only to the domain, i.e. the underlying vocabulary -^206, but also to changes 
in the style of the documents. Otherwise, the semantic classification error rate of the 
5 semantic classification imit 112 begins to increase as the new words and docimients 110 
vary from those contained in the original training corpus 7202. However, re-training by 
re-computing the LSA space 5 316 generated by the basic LSA paradigm 300 is too 
computationally intensive to be of practical use, 

FIG. 4 illustrates selected components of the adaptive LSA paradigm 400 using 

1 0 latent semantic adaptation in accordance with an embodiment of the present invention. 
The adaptive LSA paradigm 400 extends the basic LSA paradigm 300 so that some or all 
of the data in new documents 1 10 are taken into account through incremental adaptation 
of the original LSA space 5 3 1 6 in a way that is computationally efficient. Adaptation of 
the original LSA space 5 316 insures that the semantic classification error rate of the 

1 5 semantic classification unit 1 12 does not substantially increase as the new words and 
documents 1 1 0 vary jfrom those contained in the original training corpus 7 202. 

The adaptive LSA paradigm 400 relies on two assumptions. The first assumption 
is that the dimension R of the original LSA space 5 3 16 is low enough that none of the 
corresponding R singular values are zero. This is typically the case since the basic LSA 

20 paradigm 300 seeks to operate at the maximum possible dimensionality reduction to 
increase computational speed without sacrificing accuracy. The second assumption is 
that the transformation necessary to adapt the original LSA space 5 3 16 is invertible. If it 
was not, then a rather pathological situation would arise: the inability to go back to the 
original LSA space 5 3 16 by simply forgetting the new data. 

25 With reference to FIG. 4, if n additional documents contain words drawn fi*om the 

original underlying vocabulary -^206 plus m words previously vmseen (i.e. out-of- 
vocabulary words), then the adaptive LSA paradigm 400 constructs a word-document 
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matrix W 402 of dimension {M + m)x{N + n) in the same manner as described for 
generating matrix W202 in the basic LSA paradigm 300 in FIG. 3. Using the same 
order of decomposition R , the SVD of W402 leads to: 

W = USV\ (4) 
5 where U 406 is the left singular matrix of dimension {M + m)xR^ S 408 is the diagonal 
matrix of dimension RxR, and V 410 is the right singular matrix of dimension 
{N + n)xR^ each having the same definitions and properties as described above for 

W,U,S,md FinFIG.3. 

As shown in FIG. 4, the m new words are gathered in the mx[N + n) matrix 

10 C = [CE] 422, the nncw documents are gathered in the {M-\-m)xn matrix 

D = 424. U 406 is expressed as [t/f t/s^f , where t/f 436 is the transposition 

of the left singular matrix of dimension MxR and [/J 438 is the transposition of the left 
singular matrix of dimension mxR. 410 is expressed as [f^^F/ where V^^ 439 is 
the transposition of the right singular matrix of dimension RxN and V2 440 is the 

1 5 transposition of the right singular matrix of dimension Rxn, The new decomposition of 

W expressed in (4) leads to a different LSA space 5 41 6, in which the word and 
document vectors are now given by the scaled row vectors Ui -u^S 418 and v j =VjS 

420 (i.e. the rows of US 412 and VS 414) to characterize the position of word and 
docxmient dj . 

20 FIG. 7 illustrates the prior art approach referred to as baseline adaptation, where 

the distinction between the SVD in (1) of the original word-document co-occurrence 
matrix 302 in FIG. 3 and the SVD in (4) of the extended word-document co- 
occurrence matrix W 402 in FIG. 4 is ignored by making the (obviously invalid) 
assumption that the original LSA space 5 316 is the same as the new LSA space 5 416. 

25 In other words, in baseline adaptation, the SVD in (1) is still assumed to be valid even 

after the new documents become available, and the problem is reduced to representing the 
new data in the original LSA space 5316. 
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Referring now to FIGS. 4 and 1, the baseline adaptation approach treats the 
portions of the matrix W 402 identified as C 430 and D 432 as merely extensions of 
additional rows or columns of the original matrix W 302, and discards altogether the 
portion of the extended matrix W 402 identified as E 434. This has the effect of 
5 ignoring significant amounts of new data, including any out-of-vocabulary words in the 
new documents. 

Using the baseline adaptation approach, the representation of those portions of the 
new data that will be added to the original LSA space 5 3 16 is obtained from the SVD of 
as C 430 and D 432 as follows: 

10 C^YSV\ (5) 

D^USZ\ (6) 
where the mxR matrix Y 426 and the nxR matrix Z 428 are defined a posteriori (as 
plug-ins), to satisfy the relationship. In essence, using the baseline adaptation framework 
700, the role of matrices Y 426 and Z 428 is to "extend" the original matrices U 306 
15 and V 310 to accommodate the new data. The original word and document vectors Ui 
318 and Vj 320 are still given by the rows of US 312 and VS314, but the new word and 
document vectors . 446 and Zj 448 are given by the rows of YS 442 and ZS 444, 

respectively. From (5) and (6), these are seen to be: 

YS = CV, (7) 

20 ZS^D^U. (8) 

The effect, illustrated in FIG. 7, is that the original LSA space 5 316 becomes populated 
with the new data, i.e. the new word and document vectors 446 and z j 448, hence the 

name "folding-in." 

A major drawback tolhe above-described baseline adaptation approach illustrated 
25 in FIG. 7 is poor performance, since even when populated with the new word and 

document vectors y. 446 and Zj448, the misclassification error rate using the original 

LSA space 5 3 16 is still high when the new words and documents vary from the original 
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training corpus 7202, e.g. when the new documents contain several new words not in the 
original training corpus. 

In contrast, the latent semantic adaptation approach of the present invention 
achieves significant reductions in the misclassification error rate. Unlike baseline 
5 adaptation, the latent semantic adaptation approach of the present invention recognizes 
that there is an important distinction between the SVD in (1) of the original word- 
document co-occurrence matrix FT 302 in FIG. 3 and the SVD in (4) of the extended 
word-document co-occurrence matrix W 402 in FIG, 4 that must be taken into account 
since the original LSA space 3 1 6 is not the same as the new LSA space S" 416. In 
10 other words, the SVD in (1) is no longer valid after the new documents become available, 
so the problem is more than just representing the new data in the original LSA space S 
316. Therefore, in one embodiment, the latent semantic adaptation approach treats the 
fli portions of the matrix W 402 identified as C 430 and/or D 432 in FIG. 4 as new data 
that must be accounted for in a new LSA space S 416. In one embodiment, the portion 
^ 15 of the matrix W 402 identified as £ 434 in FIG. 4 is also treated as new data that must 
1, be accounted for in a new LSA space 5 416. 

^ In one embodiment of latent semantic adaptation, the scaled row vectors (i.e. the 

rows of f/S 412 and VS 414) are obtained directly from the SVD of the entire matrix 
W 402 in (4) using a latent semantic adaptation framework 400 as defined in the 
20 equations that follow. By inspection from FIG. 4, 

C^U^SV,\ (9) 

D = U,SV^\ (10) 

and 

W = U,SV,^, (11) 

25 E = U^SV^ , (12) 



03 
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each of which are column-orthonormal^ i.e., tj'^U = V'^V = (the identity matrix of 
order i? ). The orthogonaUty constraints can also be expressed in terms of U^, Uj, V^, 
and V2 as follows: 

U'U = I,=UfU,^U',U,, (13) 

5 V^V =Ir^Vi^K^K^^2' (14) 

In one embodiment, the foregoing equations (9)-(14) define the latent semantic adaptation 
framework 400 of the method of the present invention. The latent semantic adaptation 
framework 400 is used to solve for the "extension" SVD matrices U 406, S 408, and 
F 410 as a function of the original SVD matrices U 306,5 308, F 310, and "extension" 

1 0 SVD matrices Y 426, and Z 428. 

According to one embodiment, the solution is obtained by setting up a latent 
semantic adaptation transformation 500, as illustrated in FIG. 5, based on the assumptions 
previously noted that the dimension R of the original LSA space S 316 is low enough 
that none of the corresponding 7? singular values are zero, and that the transformation 

15 necessary to adapt the original LSA space S 316 is invertible. Starting with S 408, the 
shift from S 308 in FIG. 3 to S 408 in FIG. 4 can be captured as illustrated in FIG. 5 by 
the following expressions: 

U,=UG, (15) 
V,=VH, (16) 
20 where G 508 and H 518 are (i? x i?) matrices that, according to the second assumption, 
are assumed to be invertible. Taken together, (15) and (16) define a latent semantic 
adaptation matrix transformation 500 to apply to the original SVD matrices 17 306 and 
r 3 10 to update them according to the new data. 

It is fairly straightforward to show that matrix transformation 500 also applies to 
25 the "extension" SVD matrices 7426, and Z 428 resulting from the "folding-in" process, 
designated in FIG. 4 as Ui 438 and 440, respectively. Specifically, first note that (1) 
and (1 1), together with (15) and (16) and the orthogonality properties of [/and V, lead to: 
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S = GSH'' 



(17) 



10 



Since matrices G 508 and ^518 are invertible and, in accordance with the first 
assumption, since both S 308 and S 408 are assumed to contain no zero singular value, 
implies the following equivalent identities: 

G = SH-''S-\ (18) 

H = SG-^S'\ (19) 
where the latter identity exploits the fact that S 308 and S 408 are diagonal. 

On the other hand, equations (5) and (9), together with (16), can be expressed as: 

YS = U^SH\ (20) 
while equations (6) and (10), together v^dth (15) yield: 



SZ^ = GSV^ . 



(21) 



Thus, exploiting equation (18) in (20) and equation (19) in (21), [72 438 and 440 can be 
1 5 obtauied, after re-arranging, as: 

U,=YG, (22) 



V,=ZH. 



(23) 



Equations (22)-(23) provide a convenient way to express the "extension" SVD matrices 
U 406 and F 410 in terms of the matrices G 508 and ^ 518 of the latent semantic 
20 adaptation matrix transformation 500 postulated in expressions (15)-(16), as follows: 



U = 
V = 



G, 
H 



(24) 



(25) 



25 



The issue is now to solve for matrices G 508 and /f 518 as a function of the original 
entities. 
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The first step in solving for matrices G 508 and # 5 1 8 is to take advantage of 
equations (15)-(16) and (24)-(25) in (13)-(14). The orthogonality properties of U 306 
and F 3 10 are used to obtain: 

G'(I,+Y'Y)G = I„ (26) 
3 H'(I^ + Z'Z)H = I^ (27) 

Again invoking the assumption regarding the non-singularity of matrices G 508 
and#518, equations (26)-(27) may be re-written as: 

G-'G-' = (GG^y = (4 + YT} , (28) 
(H-'H-' = (M'r = (Ir + Z'Z), (29) 

10 or, equivalently: 

(GG') = (I, = Y'Y)-' , (30) 
{HH'') = {I^+Z^Zr' (31) 
Equations (30)-(31) can in turn be used to deriveG 508 as a function of "extension" SVD 
matrix 7426, and#518 as a function of "extension" SVD matrix Z 428. Note that the 
1 5 inverse appearing in the right hand side of equations (30) and (3 1) may not have to be 
computed directly. Recall the well-known matrix identity: 

iA + P'Qy=A-' -{A-'P'){I + QA-'P'r\QA-% (32) 
for any nonsingular (square) matrix A and matrices P , Q with compatible dimensions. 
Applied to (30)-(31), this results in: 
20 {G&) = I,-r{I^^YY'rY^ (33) 

{HH') = 4 - Z(4 + ZZ'r'Z, (34) 
which may be computationally beneficial if m<R and/or n<R, Regardless of how the 
inverse is computed, once the right hand side is known, the computation of matrices G 
508 and H 518 can be done efficiently through Choleski decomposition, or, in the 
25 symmetric case, through matrix square root computation. 
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To be able to derive the new vectors US 412 and VS 414 that are to populate the 
new LSA space 5 416, the matrix S 408 must be expressed as a function of known 
quantities. But observe ftom equations (24)-(25) that: 



US 
VS 



GS 



HS. 



(35) 
(36) 



Thus, it is sufficient to fmd suitable expressions for GS and HS . 
From equations (17), (28), and (29), it is clear that: 

{GSXGSy = GS'G^ = SH-^H-'S = 8(1^ + Z'Z)S , (37) 
{HS){HSf = HS'H^ = SG-'G-'S = S(I, + Y^Y)S . (38) 
Thus, it is also possible to obtain GS and directly through Choleski decomposition, 
in a manner analogous to that mentioned above G 508 and # 5 1 8. In fact, as illustrated 
in FIG. 6, if J 608 and ^ 618 are the solutions of relevant Choleski decompositions, 
viz.: 

jr=(I^+Y'Y), (39) 
KK' =(I,+Z'Z), (40) 
then equations (35)-(38) admit as solutions: 

K, (41) 



US = 

vs = 



us 

YS 
'VS 

zs 



J 



(42) 



In other words, in accordance with one embodiment of the present uivention, the 
origmal vectors f/S 312 and 314, as well as the new vectors resulting from the 
"folding-in" process YS 442 and ZS 444, can be transformed using a latent semantic 
adaptation vector transformation 600 defined by the transformation matrices 618 and 
J 608 to respectively yield the updated word vectors US 412 and document vectors VS 
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414. Therefore, equations (41) and (42) make it possible to adapt the original LSA space 
5316 of FIG. 3 to the new LSA space S 416 of FIG. 4. 

In one embodiment of the latent semantic adaptation framework 400, the new 
information, as reflected through the transformation matrices Z 61 8 and J 608, affects 

5 both original word and document vectors u, 318 and v j 320 and new word and document 
vectors 'y, 446 and ~Zj 448, referred to as two-sided adaptation. Stated another way, the 
transformed representation of the new word and document vectors 446 and zj 448 
takes into account its own influence on the underlying semantic knowledge that was 
encapsulated in the original LSA space S 316 of FIG. 3 (i.e. the existing word and 

1 0 document vectors 3 1 8 and 320) to yield the transformed word and document 
vectors 418 and 420 that populate the new LSA space S 416 of FIG. 4. As 
indicated by the arrows in the new LSA space S 416 of FIG. 4, the positions of both the 
words and documents represented by original word and document vectors u, 318 and 
Vj 320 have shifted from their positions in the original LSA space S 3 16 to reflect their 

1 5 changed position (i.e. their relationship) within the new LSA space ^416. The new LSA 
space S 416 allows not only for improvements in the misclassification error rate, but also 
provides the ability to adapt the speech recognition database that embodies the new LSA 
space S 416 in real-time, because the application of the transformation matrices i<:618 
and J 608 is computationally efficient and b^-passes the need to re-compute the LSA 

20 space. 

In another embodiment of the latent semantic adaptation framework 400 of the 
present invention, only the new documents are considered and not the new words 
appearing in those new documents, referred to as one-sided adaptation. As in two-sided 
adaptation, one-sided adaptation does not simply "fold-in" the new documents, but rather 
25 generates a fransformed representation of the new document vectors v ^ 420 to generate a 
new LSA space 5 416. While not as dramatic an improvement of the misclassification 
rate is obtained with one-sided adaptation as can be obtained with two-sided adaptation, 
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X 



in certain applications of a speech recognition system 100, one-sided adaptation may be 
sufficient to allow for real-time adaptation of the original LSA space 5316. 

In addition to providing improved performance through lowering the 
misclassification rate, it is also worth noting that the latent semantic adaptation 

5 framework 400 and resulting latent semantic adaptation matrix and vector 

transformations 500 and 600 respectively are computationally efficient. Compared to the 
"folding-in" computations of the baseline adaptation approach, the latent semantic 
adaptation matrix and vector transformations 500 and 600 of the latent semantic 
adaptation framework 400 entail less overhead. For example, in terms of the number of 

1 0 floating point operations required, the overhead associated with the latent semantic 

adaptation vector transformations 600 embodied in equations (39)-(42) can be expressed 
as: 

N^^p,=^R^ +[{M + N) + 2im + n)-\)Y^ +{m + n + \)R. (43) 

1 5 For typical values of the various dimensions involved, expression (43) vAW be dominated 
by (M + N)R^ . Depending on the application, this quantity may fall anywhere between 
about 50 million (for voice command and control types of speech recognition 
applications using a limited vocabulary) and more than 1 billion (for large vocabulary 
transcription). Still, on current high-end machines, this quantity only represents up to a 

20 few seconds of central processor unit (CPU) time. Compared to recomputing the SVD 
from scratch, which requires 0{MNR) operations, the computational complexity is 
reduced by a factor of approxunately min(M, N)IR. In many speech recognition 
applications, the reduction factor will be on the order of 1000. In such cases, the latent 
semantic adaptation framework 400 and resulting latent semantic adaptation matrix and 

25 vector transformations 500 and 600 make it practical to adapt the new LSA space S 4 1 6 
with real-time word and document updates, whereas SVD re-computation would 
generally not be feasible. 
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In some embodiments of the latent semantic adaptation framework 400 of the 
present invention it may be better to proceed separately in the estimation of the scaled 
row vectors tj§ 412 and 414 (or possibly, the transformation matrices G 508 and 
^518) that comprise the transformed word and document vectors ui 418 and v ^ 420 

5 respectively, while in other embodiments it may be better to first compute one and then 
use the results to compute the other. The choice depends on what piece of evidence can 
be considered the most reliable. 

For example, in speech recognition applications for dictation it is likely that the 
new data would be primarily new documents, with a few occasional new words. This is 

1 0 because the vocabulary there is very large (e.g., 60,000 words), so most of the words in a 
new document (e.g., letter) would already be knovra. In speech recognition applications 
for command and control, on the other hand, the vocabulary is more limited, so it is likely 
that the new data would contain new words. In the context of semantic inference, for 
example, the same command can be expressed in alternative ways, using different words, 

1 5 e.g. "make a new spreadsheet" when the pre-defined wording of the command is "Open 
Microsoft Excel." 

In one embodiment, if the new data primarily involves adding new documents, 
then both matrix D 432 and matrix E 434 would be reliable starting points since they 
contain the new document and new word data for the new documents, while matrix 

20 C 430 might not, since it contains only the new word data. In that case, it follows that 
"extension" SVD matrix Z 428 (computed from equation (8)) is more reliable than 
"extension" SVD matrix 7426 (computed from equation (7)), which means that 
transformation matrix A: 618 is more reliable than transformation matrix J 608 . 
Consequently, it is better to first compute the transformed word vectors m , 41 8 fi:om US 

25 412 using the transformation matrix .^618 in equation (41). Instead of computing 
"extension" SVD matrix 7426 from the less reliable matrix C 430 in equation (7), 
"extension" SVD matrix 7426 can now be obtained from equation (44) as: 
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E = YSZ\ 



(44) 



which, in turn, implies: 



Y = EZ{Z^Z)-^S-\ 



(45) 



Thus, if the new data primarily involves adding new documents, then "extension" SVD 
matrix 7426, i.e. the new word data, can be secondarily obtained through (45) for use in 
(39) and then (42), which completes the transformation. 

On the other hand, in an alternate embodiment, if the new data primarily involves 
adding new words, then both matrices C 430 and E 434 would be reliable startmg points 
since they contain all of the new word data for the new documents, while matrix D 432 
might not, since it contains only the new document data. In that case, it follows that 
"extension" SVD matrix 7426 (computed from equation (7)) is more reliable than 
"extension" SVD matrix Z428 (computed from equation (8)), which means that 
fransformation matrix J 608 is more reliable than transformation matrix ^618. In that 
case it is advisable to first compute the transformed document vectors v j 420 from VS 
414 using transformation matrix J 608 in equation (42). Instead of computing 
"extension" SVD matrix Z 428 from the less reliable matrix D 432 in equation (8), 
"extension" SVD matrix Z 428 can now be obtained from equation (44) as: 



Equation (46) can, in turn, be used in (40) and then (41), which completes the 
transformation. 

From the foregoing, it is apparent that equations (45) and (46) provide a 
convenient way to compute "extension" SVD matrix 7426 as a function of "extension" 
SVD mafrix Z 428, or vice versa, as necessary. It should be noted that other alterations 
in the order of computation of the equations of the matrix and vector transformations 500 
and 600 resulting from the latent semantic adaptation framework 400 might be used to 
accommodate the different word and document usage characteristics of different speech 
recognition systems and applications without departing from the scope of the invention. 



Z = E'^Y{Y^Y)-^S-\ 



(46) 
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Turning now to FIG. 8, the particular methods of the invention are described in 
terms of computer software with reference to a flowchart. The methods to be performed 
by a computer constitute computer programs made up of computer-executable 
instructions. Describing the methods by reference to a flowchart enables one skilled in 
5 the art to develop such programs including such instructions to carry out the methods on 
suitably configured computers (the processor of the computer executing the instructions 
from computer-accessible media). The computer-executable instructions may be v^itten 
in a computer programming language or may be embodied in firmware logic. If written 
in a programming language conforming to a recognized standard, such instructions can be 
1 0 executed on a variety of hardware platforms and for interface to a variety of operating 
systems. In addition, the present invention is not described with reference to any 
particular programming language. It will be appreciated that a variety of programmuig 
languages may be used to implement the teachings of the invention as described herein. 
Furthermore, it is common in the art to speak of software, in one form or another (e.g., 
1 5 program, procedure, process, apphcation...), as taking an action or causing a result. Such 
expressions are merely a shorthand way of saying that execution of the software by a 
computer causes the processor of the computer to perform an action or a produce a resuh. 

FIG. 8 is a flowchart illustrating the process followed in using latent semantic 
adaptation to generate a new LSA space S 416 according to one embodiment of the 
20 present invention. Initially, at process block 810, a speech recognition system 100 such 
as the one illustrated in FIG. 1, generates an original LSA space 5 316 from a singular 
value decomposition (SVD) of a word-document correlation matrix ^302 of word- 
document co-occurrences in a training text corpus 7202 of words and documents 
representative of a particular domain and style of language composition. The SVD 
25 results in original left and right singixlar matrices [/ 306 and F 3 1 0, respectively, and 

diagonal matrix S 308, where the rows of US 312 and VS 314 characterize the position 
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of the words and documents of training text corpus 7 202 in the original LSA space S 
316. 

In one embodiment, after the training process 810 is completed, an adapted LSA 
space derivation unit 1 1 1 employed in the speech recognition system 100 continues at 
process block 820 to gather new words and documents in the extension matrices C 430, 
D 432 and E 434, which extend the data in the original word-document correlation 
matrix W 302 (i.e. the new words and documents not present in the original training text 
corpus 7 202). Using an SVD, the adapted LSA space derivation unit 111, obtains the 
"extension" SVD matrices 7426 and Z 428 which extend the original left and right 
singular matrices U 306 and K 310 by "folding-in" the new words and new documents, 
respectively. 

In one embodiment, processing continues at process block 830, where the adapted 
LSA space derivation unit 1 1 1 may optionally apply the transformation matrices G 508 
and ^ 5 1 8 to the left and right singular matrices [/ 306 and F 3 1 0, to update them with 
the new words and new documents, respectively. In addition, in one embodiment, the 
adapted LSA space derivation unit 1 1 1 may optionally apply transformation matrices G 
508 and ^ 518 to the "extension" SVD matrices 7426 and Z 428, to update them with 
the new words and documents, respectively. In normal operation, however, process block 
830 is unnecessary since only the updated word and document vectors US 412 and VS 
414 are actually needed to generate the new LSA space ^ 416 (as described in process 
block 840, below). 

In one embodiment, processing continues at process block 840, where the adapted 
LSA space derivation unit 1 1 1 derives the updated word and document vectors US 412 
and 414 by applying the transformation matrices J 608 and ^ 618 to the word and 
document vectors US 312 and 3 14, as well as the extended word and document 
vectors YS 442 and ZS 444, respectively. 



04860.P2638 



-26- 



Express Mail No. EL034435545US 



In general, a significant improvement in the misclassification error rate of the 
speech recognition system 100 is obtained by simultaneously updating both the word and 
document vectors. However, in some embodiments of latent semantic adaptation, the 
improvement may be even more significant as well as more efficient if, at processing 

5 blocks 820 - 840, the adapted LSA space derivation unit 1 1 1 first derives the updated 

word vectors US 412, and then derives the document vectors 414, or vice versa. This 
is because when the new data primarily involves adding new documents, for example, the 
"extension" SVD matrix 7426 can be obtained from the values already computed for E 
434, Z 428, and S 308 using equation (45), whereas when the new data primarily 

10 involves adding new words, the "extension" SVD matrix Z 428, can be obtained from 
the values already computed for E 434, F426, and S 308 using equation (46). In other 
embodiments, it may be sufficient to process only the new documents using a one-sided 
adaptation approach instead of the full two-sided adaptation. 

In one embodiment, processing continues at process block 850, where the adapted 

1 5 LSA space derivation unit 1 11 generates the new LSA space 5 416 by populating it with 
the newly derived updated word and document vectors US 412 and FS 414 obtained at 
process block 840, i.e. the scaled row vectors w ; 418 and v j 420. 

FIG. 9 illustrates one embodiment of a computing device suitable for use v^th one 
embodiment the present invention. As illustrated, the speech recognition system 100 of 

20 FIG, 1 may be implemented on a computer system 900. Computer system 900 includes 
processor 902, display device 906, and input/output (I/O) devices 908, coupled to each 
other via a bus 910. Additionally, a memory subsystem 912, which can include one or 
more of cache memories, system memory (RAM), and nonvolatile storage devices (e.g., 
magnetic or optical disks) is also coupled to bus 910 for storage of uistructions and data 

25 for use by processor 902. I/O devices 908 represent a broad range of input and output 
devices, including keyboards, cursor control devices (e.g., a trackpad or mouse), 
microphones to capture the voice data, speakers, network or telephone communication 
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interfaces, printers, etc. Computer system 900 also includes well-known audio 
processing hardware and/or software to transform analog voice data to a digital form 
which can be processed by the speech recognition system 100 implemented in computer 
system 900. In addition to personal computers, laptop computers, and workstations, in 

5 some embodiments, computer system 900 may be incorporated in a mobile computing 
device such as a personal digital assistant (PDA) or mobile telephone without departing 
from the scope of the invention. 

Components 902 - 912 of computer system 900 perform their conventional 
functions known in the art. Collectively, these components are intended to represent a 

1 0 broad category of hardware systems, including but not limited to general purpose 

computer systems based on the PowerPC® processor family of processors available from 
Motorola, Inc. of Schaxmiburg, Illinois, or the Pentium® processor family of processors 
available from Intel Corporation of Santa Clara, California. 

It is to be appreciated that various components of computer system 900 may be re- 

1 5 arranged, and that certain implementations of the present invention may not require nor 
include all of the above components. For example, a display device may not be included 
in system 900. Additionally, multiple buses (e.g., a standard I/O bus and a high 
performance I/O bus) may be included in system 900. Furthermore, additional 
components may be included in system 900, such as additional processors (e.g., a digital 

20 signal processor), storage devices, memories, network/communication interfaces, etc. 

In the illustrated embodiment of FIG. 9, the method and apparatus for speech 
recognition using latent semantic adaptation with word and document updates according 
to the present invention as discussed above is implemented as a series of software 
routines run by computer system 900 of FIG. 9. These software routines comprise a 

25 plurality or series of instructions to be executed by a processing system in a hardware 

system, such as processor 902 of FIG. 9. Initially, the series of instructions are stored on 
a storage device of memory subsystem 912. It is to be appreciated that the series of 
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instructions can be stored using any conventional computer-readable or machine- 
accessible storage medium, such as a diskette, CD-ROM, magnetic tape, DVD, ROM, 
Flash memory, etc. It is also to be appreciated that the series of instructions need not be 
stored locally, and could be stored on a propagated data signal received from a remote 
5 storage device, such as a server on a network, via a network/communication interface. 
The instructions are copied from the storage device, such as mass storage, or from the 
propagated data signal into a memory subsystem 912 and then accessed and executed by 
processor 902. In one implementation, these software routines are written in the C++ 
programming language. It is to be appreciated, however, that these routines may be 
1 0 implemented in any of a vdde variety of programming languages. 

These software routines are illustrated in memory subsystem 912 as speech 
5 recognition instructions 920, latent semantic adaptation instructions 922, latent semantic 

m classification instructions 924, and action generation instructions 923. Also illustrated 

are analog to digital (A/D) transformation instructions 925, acoustic model(s) 926, and 
2 1 5 language model(s) 927 that support the speech recognition system 1 00. 
l^^^ In the illustrated embodiment, the memory subsystem 912 of FIG, 9 also includes 

'-5 the semantic anchors 928 that comprise the word and document vectors of the LSA 

IS spaces 3 12 and 412. In one embodiment, the semantic anchors 928 are implemented in a 

M' speech recognition database using any of a wide variety of database formats known in the 

20 art. As v^th the software instructions, the semantic anchors 928 may be copied from a 
storage device, such as mass storage, or from a propagated data signal into the memory 
subsystem 912 and then accessed and executed by processor 902. 

In alternate embodiments, the present invention is implemented in discrete 
hardware or firmware. For example, one or more application specific integrated circuits 
25 (ASICs) could be programmed with the above described fiinctions of the present 

invention. By way of another example, speech recognition unit 104, adapted LSA space 
derivation unit 111, semantic classification unit 1 12, and application unit 118 of FIG. 1 
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could be implemented in one or more ASICs of an additional circuit board for insertion 
into hardware system 900 of FIG 9. 

In the discussions above, the present invention is described with reference to 
speech recognition systems. It is to be appreciated, however, that alternate embodiments 

5 of the present invention can be used with other types of pattern recognition systems, such 
as visual rather than audio pattern recognition, handwriting recognition systems (e.g., 
optical character recognition (OCR)), etc. 

It is to be appreciated that the method and apparatus for speech recognition using 
latent semantic adaptation vnth word and document update of the present invention can 

1 0 be employed in any of a wide variety of manners. By way of example, a speech 

recognition system employing latent semantic adaptation with word and document update 
could be used in conventional personal computers, security systems, home entertainment 
or automation systems, etc. 

The performance of the above system was tested in the context of the "Speakable 

1 5 Items" desktop command and control task defined on the MacOS operating system. FIG. 
10 illustrates a graph 1000 showing the misclassification error rates 1002 of five different 
approaches to command classification versus the number of variants 1004 of a collection 
of canonical conmiands from a number of different speakers. During performance 
testing, the five different approaches that were considered were: (i) full re-computation of 

20 the LSA space fi*om scratch 1006, which served as a benchmark against which to measure 
the other setups; (ii) constant dimension re-computation of the LSA space 1008, where 
the LSA space is re-computed from scratch, but the LSA dimension remains constant at R 
- 100; (iii) baseline adaptation of the LSA space 1010 using traditional "folding-in" of 
the new words and documents in the existing LSA space; (iv) one-sided adaptation of the 

25 LSA space 1012, where only the new docviments are used to adapt the LSA space using 
one embodiment of latent semantic adaptation in accordance with the present invention; 
and (v) two-sided adaptation of the LSA space 1014, where both the new documents as 
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well as the new words contained in the new documents are used to adapt the LSA space 
using one embodiment of latent semantic adaptation in accordance with the present 
invention. 

As shown in FIG. 10, the two-sided adaptation approach 1014 remains 

5 competitive with constant dimension re-computation 1 008 when the new words and 
documents vary from the original training corpus. In other words, the two-sided 
adaptation approach 1014 achieves a lower misclassification error rate 1002 than the 
baseUne adaptation approach 1010 as the number of variant instances 1004 increases. 
Although less optimal than the two-sided approach, the same holds true for the one-sided 

1 0 adaptation approach 1 012. Moreover, the competitive performance of the one-sided and 
two-sided latent semantic adaptation approaches 1012 and 1014 is achieved at a 
computation cost only slightly greater than the traditional "folding-in" of the baseline 
adaptation approach 1010, which has a much lower level of performance. 

Therefore, a method and apparatus for speech recognition using latent semantic 

1 5 adaptation has been described. An audio input is provided to a speech recognizer that 
identifies the words in the input. The sequence of words comprising the audio input are 
then provided to an adapted LSA space generation unit that first trains the speech 
recognition system by generating an LSA space that reflects the semantic knowledge 
represented by a training corpus of words and documents, and then continually adapts the 

20 LSA space to reflect the semantic knowledge represented by the new words and 

documents as they become available. The resulting adapted LSA space can then provided 
to an application such as a semantic classifier that classifies tibe audio input as 
corresponding to a particular command, word, or sequence of words, depending on the 
application. This classification process is advantageously based on an adapted semantic 

25 representation of all of the words and documents that comprise the audio input rather than 
on a semantic representation of just the words and documents in the original training 
corpus. Thus, any application that needs to quickly and accurately recognize a particular 
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speech utterance associated with a semantic representation may employ the adapted 
semantic representation provided by the method and apparatus of the present invention. 
The adapted semantic representation of all of the words and documents comprising the 
audio input advantageously allows the present invention to accurately recognize speech in 

5 real-time, even when the speech includes words and documents not in the original 
training corpus. Finally, whereas many alterations and modifications of the present 
invention will be comprehended by a person skilled in the art after having read the 
foregoing description, it is to be understood that the particular embodiments shown and 
described by way of illustration are in no way intended to be considered limiting. 

10 References to details of particular embodiments are not intended to limit the scope of the 
claims. 
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